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Abstract Reforestation in the humid tropics and arid zones, 
where trees are often subject to stresses, is an effective 
approach for mitigating global warming. Forestation with 
Populus species that are tolerant to the stresses in such 
regions has been conducted. The selection of poplar trees 
with higher stress tolerance leads to more efficient refores¬ 
tation. The genome-wide bioinformatics approaches of 
gene function have been used for revealing the mechanisms 
of biological processes, including such stress tolerance. The 
decoding of the poplar genome has been followed by the 
genome-wide identification of genes and then the inference 
of gene function for systematic understanding of biological 
processes. To predict gene function in poplar, we analyzed 
poplar gene expression data using DNA microarray datas¬ 
ets obtained from the Gene Expression Omnibus database 
and developed a database for poplar gene co-expression 
analysis. Using the database, we illustrate the steps to 
retrieve two groups of co-expressed genes that are specifi¬ 
cally expressed in experiments of hypoxic stress response in 
gray poplar, a flooding-tolerant tree species. Our database 
allows users to extract genes involved in biological pro¬ 
cesses, such as stress reaction, and then is useful for under¬ 
standing such mechanisms in tree species. 
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Introduction 

Reforestation in humid tropics and arid zones is an effective 
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approach for mitigating global warming. Study of the for¬ 
estation of Populus species that are tolerant to stresses in 

Y. Ogata • H. Suzuki • D. Shibata (El) 

Department of Biotechnology Research, Kazusa DNA Research 
Institute, 2-6-7 Kazusa-Kamatari, Kisarazu, Chiba 292-0818, Japan 
Tel. +81-438-52-3900; Fax +81-438-52-3948 
e-mail: shibata@kazusa.or.jp 


Part of this report was presented at the 59th Annual Meeting of the 
Japan Wood Research Society, Matsumoto, March 2009 


such regions has been conducted; e.g., in the Chinese arid 
zone by the Japan Association for Greening Deserts project 
(http://www.sabakuryokka.org/). Poplar trees adapt to 
dry, 2 5 cold, 6,7 and flooding regions 8 with stress responses. 
Understanding of stress responses in poplar allows selecting 
poplar trees adequate for surviving in such regions, which 
then leads to better reforestation. Previous reports ’ “ have 
revealed the functions of individual genes involved in stress 
responses. A genome-wide analysis of poplar gene function 
is an approach to predict mechanisms of biological pro¬ 
cesses, including such stress tolerance. 

In this decade, genomes of plants such as Arabidopsis 
thaliana and rice have been decoded, and their genomic 
information has been utilized to analyze the expression of 
genome-scale transcripts (mRNA) or transcriptomes using 
DNA microarray technology and the bioinformatics 
approach. Bioinformatics approaches have been developed 
along with the explosive accumulation of omics data, 
such as those of genome, transcriptome, and proteome, and 
the rapid progress of computer technology. Although 
the outputs from such approaches are indecisive, i.e., the 
identification of gene function requires evidence based 
on experimental approach, the approaches have been 
developed for and contribute to the prediction of gene func¬ 
tion as a consequence of their genome-wide and high- 
throughput performance. 

The genome draft of black cottonwood {Populus tricho- 
carpa) was published in 2006 by Tuskan et al., 19 and poplar 
DNA microarray techniques have been developed. Recent 
studies ’ ’ have analyzed the poplar transcriptome using 
bioinformatics approaches, and the analyzed data have 
been deposited in public databases such as AspenDB 
(http://aspendb.uga.edu). Such transcriptome analyses in 
poplar have promoted the gene-level understanding of 
various types of biological processes, including stress 
responses. The tendency of genes involved in a common 
process to show a similar expression profile, or be “co¬ 
expressed,” has been reported. 21,22 Therefore, co-expression 
analyses have been utilized to predict functional relatedness 
such as the commonality of metabolic pathways, 23 protein 
complex, 24 and stress responses. 25 
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Co-expression analysis in poplar has been performed 

A/" 

since the decoding of the genome. Gronlund et al. per¬ 
formed co-expression analysis to reconstruct the genome¬ 
wide co-expression network, in which genes are linked to 
other genes on the basis of gene-to-gene similarity in their 
expression profiles. An analysis using the co-expression 
network in plants has provided systematic information on 
gene-to-gene associations through microarray analyses 
performed in various experiments. 27,28 The information on 
experiments contributing to such associations promotes 
our understanding of functional relatedness between genes 
and biological processes such as the stress-response 
mechanism. 

We constructed a poplar co-expression database for the 
genome-wide prediction of gene function based on poplar 
co-expression analysis. Ninety-five publicly available DNA 
microarray datasets were obtained from the Gene Expres¬ 
sion Omnibus (GEO) database (http://www.ncbi.nlm.nih. 
gov/geo/). We performed a co-expression network analysis 
using the datasets and extracted “co-expression modules,” 
comprising co-expressed genes, which are tightly intercon¬ 
nected to each other. Information on the experiments 
contributing to gene-to-gene connections in co-expression 
modules was associated with the modules. As examples to 
extract co-expression modules, we demonstrate two 
modules that are composed of genes expressed in hypoxic 
stress-response experiments. Our database is available at 
http://webs2.kazusa.or.jp/kagiana/cop/. 


Materials and methods 

We obtained gene expression datasets of 95 chips of Affyme- 
trix Poplar Genome Array (Affymetrix), which are com¬ 
posed of 61413 probes representing genes, from GEO. The 
data files were processed using Bioconductor 2.3.13 with R 
version 2.8.1 to obtain text-formatted data. Gene expres¬ 
sion data in the files were standardized, and cosine correla¬ 
tion coefficients between all pairs of probes were calculated 
using our R program. 

For functional characterization of a poplar co-expression 
module, information on functions of genes comprising the 
module may be insufficient. For better biological under¬ 
standing of a poplar co-expression module, we used the 
information on functions of Arabidopsis genes that are best 
homologous to poplar co-expressed genes using BLAST. 
As an index of homology, a harmonic mean of mutual 
values of identities was calculated, referred to as the homol¬ 
ogy F-measure (HF). 


Results 

We constructed a database for poplar co-expression analy¬ 
sis using publicly available DNA microarray datasets, which 
is under the CoP database management system. It allows 
users to perform co-expression analyses of Arabidopsis , 
poplar, and soybean and to retrieve biological processes 


based on gene ontology. 29 By a query of a gene of interest, 
users can access to a page providing information on the co¬ 
expression module that is composed of genes co-expressed 
with the gene. In the portal page of the database (Fig. 1), 
steps to input required items are implemented as follows: 
(I) input a query word, e.g., a probe identifier, gene identi¬ 
fier, or Arabidopsis gene identifier (AGI code); (II) select 
a plant organism, e.g., select “Populus trichocarpa (poplar) 
Affymetrix”; (III) select an information type, e.g., select 
“Confeito (co-expression analysis)”; (IV) skip this step, 
which is not available for this purpose; and then (V) click 
the submit button. 

Summarized descriptions of a co-expression module are 
composed of three parts (Fig. 2). Namely, the first part 
includes the information of a query gene identifier, tightness 
index of the module (referred to as the network F-measure; 
NF), and module size (the number of genes). The second 
part represents “Descriptions” of individual genes included 
in the module, composed of probe identifiers, representative 
public identifiers, and information on the homologous Ara¬ 
bidopsis genes; i.e., AGI code, HF, gene names, short 
descriptions of the function, and GO biological processes, 
respectively. The third part provides information on “Spe¬ 
cific Experiments,” in which the co-expressed genes are 
specifically expressed, composed of standardized scores of 
the genes, sample names, experimental identifiers of GEO, 
links to detailed description of experiments, and experiment 
titles. From the descriptions of the parts, users can obtain 
information on the module members that are specifically 
expressed under particular experiments. 

To afford users the ability to retrieve co-expression 
modules, we illustrate the steps to retrieve a co-expression 
module by using the following example. In the portal site 
(http://webs2.kazusa.or.jp/kagiana/cop/): 

I. Input “confeito,” which is the word ready to retrieve a 
co-expression module with tight intramodular 
interconnections. 

II. Select “Populus trichocarpa (poplar) Affymetrix.” 

III. Select “Confeito (co-expression analysis).” 

IV. Make no selection. 

V. Click the “Submit” button. 

Next, in the page of a list of co-expression modules: 

VI. Click “Ptp.4.1.Al_a_at” as an example probe identi¬ 
fier, whose Arabidopsis homologous gene is At5gl5630, 
named “COBL4/IRX6 (COBRA-LIKE4).” 

Then, a summarized description of the co-expression 
module that is composed of genes co-expressed with the 
query gene is displayed (see Fig. 2). The “Descriptions” 
part of this module is composed of functional descriptions 
of genes that may be involved in cell wall biosynthesis; 
i.e., “TUB2 (Tubulin beta-2),” “microtubule-associated 
protein (MAP65/ASE1) family,” “COBL4/IRX6 (COBRA- 
LIKE^),” “FLA11 (fasciclin-like arabinogalactan-protein 
11),” and “GAUT12/IRX8/LGT6 (GALACTURONOS- 
YLTRANSFERASE 12).” In the “Specific Experiments” 
part, the “Z-scores” of two samples for an experiment 
for root hypoxia (GSM328533 and GSM328532) are high 
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Fig. 1 . The retrieval form of 
CoP database. The retrieval 
steps are as follows: (I) input 
a query word; e.g., an AGI 
code and a microarray probe 
name; (II) select a “Populus 
trichocarpa (poplar) 
Affymetrix”; (III) select 
“Confeito (co-expression 
analysis); (IV) skip this step; and 
(V) click the “Submit” button. 
When a single candidate gene 
is found, a page including 
information on a co-expression 
module that is composed of 
genes co-expressed with the 
gene is directly displayed. 
Otherwise, a page of a list of 
gene identifiers that are linked 
to pages including information 
on co-expression modules is 
displayed 



X- Xnput a quern Word 


For exam 


[£. /Select a plant otqaniw 



Arabidopsis thaliana (5k assays) 


Jll- jS elect an information tope 


Confeito (co-expression analysis) 
o Biological process 


JX' /Select an additional option 


When Biological process is selected in Step III, select Evidence Code Categories. 

X (experimental): EP, IDA, IPI, IMP, IGI, IEP 
E S (statement): TAS, 1C 

0 C (computational): ISS, ISO, ISA, ISM, IGC, RCA 
L (electronic): IEA 
E N (not available): NAS, MD 



(6.8 and 6.7, respectively), indicating that these module 
members are specifically expressed through root hypoxic 
stress response. 

When performing a query of “PtpAffx.l4026.1.Sl_s_at,” 
a co-expression module for the query gene also included 
genes involved in cell wall biosynthesis; i.e. in the descrip¬ 
tions of Arabidopsis genes orthologous to the poplar genes 
in the module, “LAC10 (laccase 10),” “AGP10 (arabinoga- 
lactan protein 10),” “CESA8 (CELLULOSE SYNTHASE 
8),” “TUA6 (tubulin alpha-6 chain),” “caffeoyl-CoA 3-0- 


methyltransferase, putative,” “IRX9 (IRREGULAR 
XYLEM 9),” and “AGP2 “ARAB INOG AL ACTAN - 
PROTEIN 2).” In the “Specific Experiments,” three 
samples for an experiment for root hypoxia (GSM328484, 
GSM328533, and GSM328532) showed high Z-scores (6.1, 
5.3, and 5.2, respectively). Two samples of them are common 
with those for the co-expression module in the previous 
paragraph, indicating that genes in both co-expression 
modules are partly involved in a particular biological process 
that is influenced by hypoxic treatment. 
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Fig. 2. Functional descriptions of a co-expression module in which 
genes are specifically expressed in experiments of hypoxic stress 
response. The part of “Descriptions” includes genes involved in cell 
wall biosynthesis. The “Short description” column, which shows 
functional description of homologous Arabidopsis genes, represents 
“TUB2 (tubulin beta-2),” “microtubule association protein 
(MAP65/ASE1) family,” “COBL4/IRX6 (COBRA-LIKE4),” “FLA11 


(fasciclin-like arabinogalactan-protein 11),” and “GAUT12/IRX8/ 
LGT6 (GALACTURONOSYLTRANSFERASE 12).” In the “Spe¬ 
cific Experiments” part, the “Z score” of two samples for an experi¬ 
ment for root hypoxia are specifically high (6.8 and 6.7, respectively), 
indicating that these module members may be specifically expressed 
through root hypoxic stress response 


Discussion 

The database constructed in the present study provides 
information on poplar co-expression modules for predicting 
gene function and then contributes to understanding mech¬ 
anisms of biological processes, including that of stress 
response. Using an identifier of a gene of interest or query 
words related to a gene of interest, users can retrieve a co¬ 
expression module that is composed of genes co-expressed 
with the gene. The database provides an approach to 
retrieve co-expression modules with tight intramodular 
connections using the term “confeito” as a query word. By 
using the term, a list of co-expression modules in which 
genes are tightly interconnected becomes available. To 
display a page including information on a co-expression 
module of interest, users can select the module from among 
the list without knowledge of gene function or biological 
processes. 

Our database also provides information from experi¬ 
ments in which genes in a co-expression module are specifi¬ 
cally expressed. Public databases of plant co-expression 
analyses such as ATTED-II (http://atted.jp/), 27 CSB.DB 


(http://csbdb.mpimp-golm.mpg.de/), 30 ACT (http://www. 
arabidopsis.leeds.ac.uk/act/), 31 and AspenDB are designed 
to extract co-expressed genes using gene-to-gene correla¬ 
tion data based on publicly available DNA microarray data¬ 
sets. The correlation data between co-expressed genes are 
strongly influenced by gene expression data in experiments 
in which the genes are specifically expressed. Information 
on such experiments is useful for characterizing genes 
whose functions are unknown, particularly, in co-expression 
analysis of poplar, for which little information on gene func¬ 
tion is available. Therefore, we provide such information on 
experiments to associate them with co-expressed genes that 
are specifically expressed in the experiments. 

The co-expression modules described in the Results 
section indicate that co-expression analysis using poplar 
gene expression datasets is useful for extracting genes that 
are upregulated by stress treatments, such as that of hypoxic 
stress. The datasets for the present research, composed of 
95 chips, include 30 chips of DNA microarrays for hypoxic 
stress experiments, leading to the extraction of co-expres¬ 
sion modules involved in stress response. For predicting 
genes involved in various biological processes and under¬ 
standing their mechanisms, gene expression datasets for a 
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wide variety of experiments are required. Although bioin¬ 
formatics approaches such as co-expression analysis provide 
no evidence to identify gene function, the approaches have 
the capability to provide genome-wide information contrib¬ 
uting to the identification. 

The combination of transcriptome data with different 
omics data such as genome, proteome, and interactome may 
lead to better understanding of gene co-expression. Genomic 
data of plants are available at the TIGR Plant Transcript 
Assemblies database (http://plantta.jcvi.org/) 31 and the 
GeneChip Oncology Database (http://compbio.dfci.harvard. 
edu/tgi/plant.html).' 2 Arabidopsis omics data including that 
of the genome, transcriptome, and proteome have been 
accumulated and are available at the TAIR database (http:// 
arabidopsis.org/). 33 For combined analysis between omics 
data, omics data of plants such as poplar are further required. 
This combination of omics information allows plant bio¬ 
logists to understand the functionality of co-expression 
modules on the basis of further knowledge of molecular 
biology. 
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